Boy, what did I get myself into? Rand has chained me to my desk and ordered me to summarize the next set of videos released by Matt Cutts. This new batch of summaries is more verbatim than the last one, so enjoy.
1.Β Does Google treat dynamic pages differently than static pages?
Google does static and dynamic pages in similar ways of ranking. Page rank flows to dynamic URLs in the same way they flow to static URLs. Matt provides an example where if you have the NY Times linking to a dynamic URL, youβll still get the page rank benefit and it will still flow the page rank benefit
There are other search engines who in the past have said, βOkay, weβll go one level deep from static URLs, so weβre not gonna crawl from a dynamic URL, but weβre willing to go into the dynamic URL space from a static URL.β So, the short answer is that page rank stillΒ flows the same between static and dynamic.
Β Matt provides a more detailed answer as well. The example the question asker gave had five parameters, and one of them was a product id with “2725.” Matt maintains that you definitely can use too many parameters. He would opt for two or three at the most if you have any choice whatsoever. Also, try to avoid long numbers because Google can think that those are session ids. It’s a good idea to get rid of any extra parameters.
Remember that Google is not the only search engine out there, so if you have the ability to basically say βIβm gonna use a little bit of mod rewrite, and Iβm gonna make it look like a static URLβ, that can often be a very good way to tackle the problem. Page rank still flows, but experiment. If you donβt see any URLs that have the same structure or the same number of parameters as youβre thinking about doing, itβs probably better to either cut back on the number of parameters or shorten them somehow or try to use mod rewrite.
2.Β I have a friend whose site was hacked, and he didnβt know about it for a couple of months because Google had taken it out or something like that. Can Google inform the webmaster of this occurrence? Basically, when your site gets hacked within Sitemaps, can Google inform someone that maybe inappropriate pages were crawled?
Matt guesses that Google doesn’t have the resources to do something like that right now. In general, when somebodyβs hacked, if he has a small number of sites he’s monitoring he’ll usually notice it pretty quickly, or else the web host will alert him to it. The Sitemaps team is always willing to work on new things, but Matt guesses that this would be on the low end of the priority list.
3.Β Iβd like to use geo targeting software to deliver different marketing messages to different people in different parts of the world (ex. discounted pricing structure). Are we safe to run with this sort of plain vanilla use of geo targeting software? Clearly, we want to avoid any suspicions of cloaking.
The way that Google defines cloaking is very specific. Cloaking is defined asΒ βshowing different content to users than you show to search engines.β Geo targeting by itself is not cloaking under Googleβs guidelines because youβreΒ saying “Take the ip address. Oh, you’re from Canada (or Germany, or whatever). Weβll show you this particular page.”Β
The thing that will get you in trouble is if you treat Googlebot in some special way. If youβre geo targeting by country, donβt make a special country just for Googlebot (Matt says “Googlebotstan” as an example). Treat Googlebot just like a regular user. If you geo target by country or are coming from an ip address thatβs in the United States, just give Googlebots whatever the United States users would see. Google, for example, does geo targeting, but it’s not considered to be cloaking. Just treat Googlebot like you would any other user based on the fact that they have this ip address, and you should be fine.
Because people joked that in Matt’s videos it looks like he’s been kidnapped (he had been previously answering questions in front of a blank wall), for the next series of videos he hung the closest thing to a map of the world up behind him. In this case, the closest thing to a map was a poster of βLanguage Families of the World.β Matt reads the map and then says, “Did you know that there are over 5,000 languages spoken across the earth? How many does Google support? Only about a hundred. Still a ways to go.”
4.Β One of my clients is going to acquire a domain name very related to his business, and he has a lot of links going to it. He basically wants to do a 301 redirect to the final website after the acquisition. Will Google ban or apply a penalty for doing this 301 redirect?
In general, probably not. You should be okay because you specify that itβs very closely related. Any time thereβs an actual merger of two businesses or two domains that are very close to each other, doing a 301 should be no problem whatsoever. If, however, you are a music site and all of a sudden you are acquiring links from debt consolidation, that could raise a few eyebrows. But it sounds like this is just a run of the mill sort of thing, so you should be okay.Β
5.Β Whatβs the best way to theme a site using directories? Do you put your main keyword in a directory or on the index page? If using directories, do you use a directory for each set of keywords?
Matt thinks that the question asker is thinking too much about keywords and not enough about site architecture. He prefers a treelike architecture so everything branches out in nice, even paths.
Itβs also good if things are broken down by topic. If youβre selling clothes, you might have sweaters as one directory and shoes as another directory. If you do that sort of thing, your keywords do end up in directories.
As far as directories vs. the actual name of the html file, it doesnβt really matter that much within Googleβs scoring algorithm. If you break it down by topic but make sure those topics match well with the keywords that you expect your users to type in when they try to find your page, then you should be in pretty good shape.Β
6.Β if an e commerce siteβs URL has too many parameters and it is un-indexable, is it acceptable to use the Google guidelines to serve static html pages to the bot to index instead?
This is something to be very careful about, because if youβre not you could end up veering into cloaking. Again, cloaking is showing different content to users than to Googlebot. You want to show the exact same content to users as you do to Googlebot.Β
Matt’s advice would be to go back to that question he previously answered about dynamic parameters in URLs. See if thereβs a way to unify it so the users and Google both see the same directory. If you can do something like that, thatβs going to be much better.
If not, you want to make sure that whatever HTML pages you do show, if users go to the same page they donβt get redirected. They need to see the exact same page that Googlebot saw. Thatβs the main criteria of cloaking, and thatβs where youβll have to be careful.
7.Β I would like to use A/B split testing on my static HTML site. Will Google understand my PHP redirect for what it is, or will they penalize my site for perceived cloaking? If this is a problem, is there a better way to split test?
Matt suggests split testing in an area where search engines arenβt going to index it. Any time Google goes to a page and sees different content, or if they reload the page and see different content, that will look a little strange.Β
If you can, itβs better to use robots.txt or htaccess files or something to make sure that Google doesnβt index your A/B testing. If not, Matt recommends not using a PHP redirect. He recommends using something server side to actually serve up the two pages in place.
The one thing to be careful about is not doing anything special for Googlebot. Just treat it like a regular user. Thatβs gonna be the safest thing in terms of not being treated like cloaking.
8.Β Aw heck, how about a real question? Ginger or Mary Anne?Β
“Iβm gonna go Mary Anne.”
9.Β Should I be worried about this? Site:tableandhome.com returns 10,000 results; site:tableandhome.com β intitle:buy returns 100,000 results, all supplemental.
In general, no, don’t worry about this. Matt then explains the concept of the beaten path. If thereβs a problem with a one word search at Google, thatβs a big deal. If itβs a 20 word search, thatβs obviously less of a big deal because itβs off the beaten path. The supplemental results team takes reports very seriously and acts very quickly on them, but in general something in supplemental results is a little further off the beaten path than the main web results.
Once you start getting into negation or negation by a special operator, like intitle, etc., thatβs pretty far off the beaten path. And youβre talking about results estimates, so not actual web results but the estimate for the number of the results.
The good news is thereβs a couple things that will make our site: estimates more accurate. Thereβs at least two changes that Matt knows of in their infrastructure: One is deliberately trying to make site results more accurate, and the other one is just a change in the infrastructure to improve overall quality, but as a side benefit it will count the number of results from a site more accurately when it involves the supplemental results.
There are at least a couple changes that might make things more accurate, but in general once you start to get really far off the beaten path (-intitle, etc.), especially with supplemental results, don’t worry that much about the results estimates. Historically, Google hasn’t worried that much because not that many people have been interested. But they do hear more people expressing curiosity about the subject, so they’re putting a little more effort into it.Β
10.Β I have a question about redirects. I have one or more pages that have moved on various websites. I use classic ASP and [have been given a response of a 301]. These redirects have been set up for quite a while, but when I run a spider on them it handles the redirect fine.
This is probably an instance where youβre seeing this happen in the supplemental results. Matt posits that thereβs a main web results Googlebot and thereβs a supplemental results Googlebot. The next time a supplemental results Googlebot visits that page and sees the 301, it will index it accordingly and refresh and things will go fine.
Historically, the supplemental results have been a lot of extra data but have not been refreshed as fast as the main web results. If you do a cached page, anybody can verify that the results on the crawl dates vary. The good news is that the supplemental results are getting fresher and fresher, and thereβs an effort underway to make them quite fresh.Β
11.Β Iβd like to know more about the supplemental index. It seems while you were on vacation many sites got put there. I have one site where this happened. It has a page rank of 6, and it got put in the supplemental results since late May.
There is a new infrastructure in the supplemental results. Matt mentioned that on a blog post, and while he doesn’t know how many people have noticed it, he’s certainly said it before. (“I think it was in the indexing timeline, in fact.”)
As we refresh our supplemental results and start to use new indexing infrastructure in the supplemental results, the net effect is things will be a little fresher. Matt is sure that he has some URLs in the supplemental results, so he wouldnβt worry about it that much.
Over the course of the summer the supplemental results team will take all the different reports that they see, especially things off the beaten path, like site: and operators that are kind of esoteric, and theyβll be working on making sure that those return the sort of results that everybody naturally expects.